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METHOD FOR CONSTRUCTING A LIBRARY USING DNA SHUFFLING 
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FIELD OF THE INVENTION 

The present invention relates to optimizing DNA sequences 
in order to (a) improve the properties of a protein of interest 
by artificial generation of genetic diversity of genes encoding 
proteins having a biological activity of interest by the use c: 
the so-called gene- or DNA shuffling technique to create a 
large library of "genes", expressing said library of genes in a 
suitable expression system and screening the expressed proteins 
in respect of specific characteristics to determine such pro- 
teins exhibiting desired properties or (b) improve the proper- 
ties of regulatory elements such as promoters or terminators by 
generation of a library of these elements, transforming suit- 
able hosts therewith in operate conjunction with a structural 
gene, expressing said structural cer.e and screening for desir- 
aoie Drooerties in tne regulator; 



20 



BACKGROUND 0: 



:na: 



a orotein perrcrmmg a certain 
variaticn between genera and 
rroT j :)5rs c; r *.v, e same stecies differences may exist. 



It is generally :c 
bio activity exhibits a 
even be twee: 

This variation is of course even more outspoken at the generic 
level . 

This natural cenetic diversity among genes coding fcr 
proteins having basically tr.-:- same bioactivity has been gener- 
ated in Nature over biliicn. if years and reflects a natural 
optimization of tne protein? coded for in respect of the envi- 
ronment of the organism, in question. 
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In today's society the conditions of life are vastly re- 
moved from the natural environment and it has been found that 
the naturally occurring bioactive molecules are not optimized 
for the various uses to which they are put by mankind, espe- 
cially when they are used for industrial purposes. 

It has therefore been of interest to industry to identify 
such bioactive proteins that exhibit optimal properties in re- 
spect of the use to which it is intended. 

This has for many years been done by screening of natural 
sources, or by use of mutagenesis. For instance, within the 
technical field of enzymes for- use in e.g. detergents, the 
washing and/or dishwashing performance of e.g. naturally occur- 
ring proteases, lipases, amylases and cellulases have been im- 
proved significantly, by in vitro modifications of the enzymes. 

In most cases these improvements have been obtained by 
site-directed mutagenesis resulting in substitution, deletion 
or insertion of specific amino acid residues which have been 
chosen either on the basis of their type or on the basis of 
their location in the secondary or tertiary structure of the 
mature enzyme (see for instance US patent no. 4,518,534). 

In this manner the preparation of novel polypeptide vari- 
ants and mutants, such as novel modified enzymes with alterec 
characteristics, e . c. specific activity, substrate specificity, 
thermal, oH and salt stability, pH-optimum, pi, V max etc., 

has successful! v been performed to obtain polypeptides with im- 
proved properties. 

For instance, within the technical field of enzym.es the 
washing and/or dishwashing performance of e.g. proteases, It- 
cases, amylases and cellulases have been improved signifi- 
30 cantly. 
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An alternative general approach for modifying proteins 
and enzymes has been based on random mutagenesis, for instance, 
as disclosed in US 4,894,331 and WO 93/01285 

As it is' a cumbersome and time consuming process to ob- 
tain polypeptide variants or mutants with improved functional 
properties a few alternative methods for rapid preparation cf 
modified polypeptides have been suggested. 

Weber et al., (1933), Nucleic Acids Research, vol. 11, 
5661-5661, describes a method for modifying genes by in vivo 
recombination between two homologous genes. A linear DNA se- 
quence comprising a plasmid vector flanked by a DNA sequence 
encoding alpha-1 human interferon in the 5 ' -end and a DNA se- 
quence encoding alpha-2 human interferon in the 3' -end is con- 
structed and transfected c a rec A positive strain of I. 
coll. Recombinants were identified and isolated using a resis- 
tance marker. 

Pompon et al. t (195-', Gene 83, p. 15-24, describes a 
method for shuffling gene g::::p.s of mammalian cytochrome P-453 
by ir. vivo recombination of partially homologous sequences m 
Saczr.aror.yces cerevisia-o by transforming Saccha rcxyces cere- 
visiae with a linearize?, rlasmic with fiiied-m ends, and a DNA 
frac.T.ent beinc partial! v :::n: I occur- to the ends of said plas- 

In WD 97/G72C5 a meihoc is cescribed whereby polypeptide 
variants are prepared by snuffling different nucleotide se- 
quences cf homologous DNA sequences by in vivo recombination 
using plasmid DNA. as template. 

US patent no. 5,092,:;" 'Assignee: Genencor Int. Inc.) 
discloses a method for pre :: .. o : ng hybrid polypeptides by in vivo 
recombination . Hybrid DNA . ■ ; .-..-rices are produced by forming a 
circular vector comprising a replication sequence, a first DNA 
sequence encoding the ammo-terminal portion of the hybrid po- 
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lypeptide, a secor.d DNA sequence encoding the carboxy-terminal 
portion of said hybrid polypeptide. The circular vector is 
transformed into a rec positive microorganism in which the cir- 
cular vector is amplified. This results in recombination cf 
5 said circular vector mediated by the naturally occurring recom- 
bination mechanism of the rec positive microorganism, which in- 
clude prokaryotes such as Bacillus and E. coli, and eukaryotes 
such as Saccharomyces cerevisiae. 

One method for the shuffling of homologous DNA sequences 
10 has been described by Stemmer (Stemmer, (1994), Proc. Natl. 
Acad. Sci. USA, Vol. 91, 10747-10751; Stemmer, (1994), Nature, 
vol. 370, 389- 391). The method concerns shuffling homologous 
DMA sequences by using in vitro PCR techniques. Positive recom- 
binant genes containing shuffled DNA sequences are selectee 
15 from a DMA library based or. the improved function of the ex- 
Dressed proteins. 

The above method is als: described in WO 95/22625. WO 
95/22625 relates to a method for shuffling of homologous DNA 
sequences. An important step ir. the method described in W2 
20 95/22525 is to cleave the homologous template double-stranded 
polynucleotide into rar.com fragments of a desired size followed 
by hom.oiogously reassembling of the fragments into full-length 
ger.es . 

, Hi... to the method of WO 95/22525 is, 

. ..^.^ -ho — v --s^v Generated through that method is 

limited due to the use of homologous gene sequences (as defined 

ir. WO 95/22625) . 

ther disadvantage ir. the method of WO 95/22625 lies ir. 



rth.G 

reduction cf 

ate couo_e-s .r^.ae. — 



racrr.er.t s by the cleavage of 



= u r t * ^ r reierer.ee c - » c ^ £ s l _ o » » ^ ^ ^ / * » i - ~ * _ . 

c----- , -".r bv r e c o rxj matron cf specif: 
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DNA sequences - so-called design elements (DE) - either by re- 
combination of synthesized double-stranded fragments or recom- 
bination of PGR generated sequences to produce so-called func- 
tional elements (FE) comprising at least two of the design 
elements. According to the method described in WO 95/17413 the 
recombination has to be performed among design elements that 
have DNA sequences with sufficient sequence homology to enable 
hybridization of the different sequences, to be recombined. 

WO 95/17413 therefore also entails the disadvantage that 
the diversity generated is relatively limited. Furthermore the 
methods described are time consuming, expensive, and not suited 
for automation. 

Despite the existence of the above methods there is still 
a need for better iterative ir. vivo recombination methods fcr 
preparing novel polypeptide variants. Such methods should also 
be capable of being performed ir. small volumes, and amenable to 
automation . 

Furthermore, there alsc is a need for methods providing 
the possibility of being able tc shuffle genes with relatively 
low homology. 



S'JM MA?.':' 0" THE INVENT!::.' 

The present invention relates to a method fcr the con- 
struction of a library of recombined polynucleotides from a 
number of different starting single or double stranded parental 
DNA templates, wherein said starting single or double stranded 
-arental DNA templates represent discrete points in a popula- 
tion of genes encoding evolutionary or synthetic hcm.ologues c: 
a peptide having homologies ranging over a broad spectrum from 
less than 15% to more than said population exhibiting a: 

least one identification sequence, and whereby said genes are 
subjected to a gene snuffling procedure to generate shuffled 
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mutants of said population of genes representing additional 
discrete points between those of said starting templates. 

The gene shuffling procedure to be used according to the 
invention can be any suitable method such as those described 
above or a procedure as described in our co-pending patent ap- 
plication filed on the same date, and outlined below. 

According to that procedure template shifts of newly syn- 
thesized DNA strands during in vitro DNA synthesis are utilized 
to achieve DNA shuffling. 

In a further aspect the invention relates to a method of 
identifying polypeptides exhibiting improved properties in com- 
parison to naturally occurring .polypeptides of the same bioac- 
tivity, whereby a library of recombined polynucleotides pro- 
duced by the above process are cloned into an appropriate vec- 
tor, said vector is then transformed into a ■ suitable host sys- 
tem, to be expressed into the corresponding polypeptides, said 
polypeotides are then screened in a suitable assay, and posi- 
tive results selected. 



Ir. a still furthe: 



invention relates to a 



method fcr producing a polypeptide of interest as identified ir. 
the r rececirc crocess, whereby a vector comprising a polynu- 
cleotide encoding said cclyoeotide is transformed into a suit- 
able host, said host is crown to express said polypeptide, and 
the oolvDeotiae recovered and purified. 



DEFINITIONS 

Prior to discussing this 
following terms will first be : 

"Shuffling": The tern, "s 
tier." of nucleotide sequence 
do! vnucieot ides resulting ir 



invention in further detail, the 
e f i n e d . 

ruffling" herein means re oombina- 
: racm.ent ( s ) between two or more 
cutout polynucleotides (i.e. 
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polynucleotides having been subjected to a shuffling cycle) 
having a number of nucleotide fragments exchanged, in compari- 
son to the input polynucleotides (i.e. starting point polynu- 
cleotides) . 

"Homology of DNA sequences or polynucleotides" In the 
present context the degree of DNA sequence homology is deter- 
mined as the degree of identity between two sequences indicat- 
ing a derivation of the first sequence from the secor.d. The ho- 
mology may suitably be determined by means of computer prograr.s 
known in the art, such as GA? provided in the GCG program pack- 
age (Program Manual for the Wisconsin Package, Version £, 
August 1994, Genetics Computer Group, 575 Science Drive, Madi- 
son, Wisconsin, USA 53711 ) (Meedleman, S.B. and Wunsch, CD., 
(1970), Journal of Molecular Biology, 48, 443-453). 

"Homologous": The term "homologous" means that one sir.- 
gle-stranded nucleic acid sequence may hybridize to a comple- 
mentary single-stranded nucleic acid sequence. The degree cf 
hybridization may depend or. a number of factors including the 
amount of identity between the sequences and the hybridization 
conditions such as temperature and salt concentration as dis- 
cussed later (vide infra). 

Usina the ccmouter program GA? (vide supra) with the foi- 
lowir.c settincs for DNA sequence comparison: GA? creation pen- 
alty of 5.0 and GA? extension penalty of 0.3, it is in the pre- 
sent context believed that two DNA sequences will be able to 
hybridize (usina low stringency hybridization conditions as de- 
fined below) if they mutually exhibit a degree of identity 
preferably of at least 70%, more preferably at least 80%, and 
even more preferably ct iesst t d -s . 

"heterologous": If tw; or more DNA sequences mutually ex- 
hibit a decree of identity which is less than above specified, 
they are in the present context said to be "heterologous". 



WO 98/41622 



6 



PCT/DK98/00103 



10 



"Hybridization:" Suitable experimental conditions for 
determining if two or core DNA sequences of interest do 
hybridize or not is herein defined as hybridization at lew 
stringency as described in detail below. 

Molecules to which the oligonucleotide probe hybridizes 
under these conditions are detected using a x-ray film or a 
phosphoimager . 

"primer": The term "primer" used herein especially in 
connection with a PGR reaction is an oligonucleotide 
(especially a "PCR-primer" ) defined and constructed according 
to general standard specification known in the art ("PCR A 
practical approach" IRL Press, .=1991)}. 

"A primer directec :c a sequence:" The term, "a primer di- 
rected to a sequence" means mat the primer (preferably to be 
15 used in a PCR reaction) is constructed to exhibit at least BZh 
degree of sequence identity t the sequence part of interest, 
more preferably at least SZ\ amree of sequence identity to the 
sequence part of interest, w:.::r. said primer consequently is 
"directed to". The primer is designed in order to specifically 
20 anneal at the region at a giver, temperature it is directed to- 
wards. Especially identity at the 3' end of the primer is es- 
sential for the functicn of the polymerase, i.e. the ability o: 
a polymerase to extend tne annealed primer. 

""•anking" Tne term "flankinc" used herein in connection 
25 with DNA sequences comprised in a PCR- f ragment means the outer- 
most partial sequences of tne PCR- fragment , both in the 5' anc 
3~ ends of the PCR fragment. 

"Polypeptide" Polymers of ammo acids sometimes referred 
to as protein. The sec-:. ■ of ammo acids determines the 
folded conformation that t:.- mlypeptide assumes, and this m 
turn determines biclocioa^ mrperties such as activity. Some 
polypeptides consist c; a single polypeptide chain 
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(monomelic), whilst other comprise several associated polypep- 
tides (multimeric) . All enzymes and antibodies are polypep- 
tides . 

"Enzyme" A protein capable of catalysing chemical reac- 
tions. Specific types of enzymes are a) hydrolases including 
amylases, cellulases and other carbohydrases , proteases, and 
lipases, b) oxidoreductases , c) Ligases, d) Lyases, e; 
Isomerases, f) Transferases, etc. Of specific interest in re- 
lation to the present invention are enzymes used in deter- 
gents, such as proteases, lipases, cellulases, amylases, etc. 



DETAILED DESCRIPTION OF THE INVENTION 

Ail possible genes encoding a polypeptide of the same 
evolutionary origin can be seen as a very large population c; 
DNA sequences {e.g. {G sp i the set of genes encoding a serine 
protease)). It has been found that the homology between the 
co 1 voeo tides encoded by sincle members of such a population may 
be even as small as less than 15^ (the genes originating frcm 
"distant" organisms) . 

v;hen searching for polypeptides suited for the various 
-•'-coses that mankind has developed, it has been found diffi- 
- -1- if not impossible at our present level of knowledge, to 
conclude in a rational manner on the optimal configuration c: 
tee polypeptide in cues t ion . Therefore it was found desirable 
~o provide a simple method of generating a sub-population cf 
the above mentioned very larce population, but representing a 



substantial part cf the variation possible within the 1; 
peculation. 



The object cf the present invention is thus to provide a 
method wherebv it is oossiclv to shuffle components of genes 
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encoding polypeptides of the same functionality, but having 

only low homologies . 

To this end it is necessary to obtain a reasonable knowl- 
edge of the population in question, meaning having at disposal 
a number of individual members ( e.g. 5, 10, 15 or more mem- 
bers) representing as high a variation as possible. This small 
sub-population is then used as a starting point for generating 
a much larger sub-population of genes. The corresponding 
polypeptides of the large sub-population obtained are then dis- 
played and screened in an appropriate manner to identify such 
members of the large sub-population that are optimal for the 

intended purpose. 

It was found that the expansion of the starting sub- 
population to the large sub-population could be accomplished 
using gene shuffling methods. 

Such methods as described in the literature provide means 
to exchange DNA fragments between genes coding for polypeptides 
of a reasonably high homology, typically to be above 80%, re- 
sulting in the generation of novel genes encoding polypeptides 
havinc homologies between 80% arc 99t>. 

It was also found that in the method of the invention it 
„ £S necessary as starting population to use genes encoding 
oolvoestides that are at least from 70%' to 80% homologous to a~ 
Isast one other gene in. the starting population. 

According to the invention it is thus important to star 
from a population or sub-set of genes which comprises interme 
ciate sequences ranging from genes being rather similar t 
genes being rather dissimilar, but still having the same evolu 
tior.ary origin (function). Cr.ly then a shuffling of even rathe 
heterologous sequences is feasible. The stepwise shuffling of 
first, quite homologous genes creates new species which are nc 
• ■ .__ snri which, in the subss 
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quent shuffling rounds, will recorrJDine with each other and with 
other more heterologous genes from the starting population, and 
so on. 

Finally, hybrids are generated in which sequence parrs 
5 from very heterologous starting genes can be found. These re- 
trieved starting genes would have never been shuffled without 
having the intermediate species in the starting population be- 
cause of a too large "sequence space" distance. 

Having this condition fulfilled it was found that it was 

10 possible to generate genes encoding novel functional polypep- 
tides having a homology as low as the minimum degree of homol- 
ogy represented in the starting population. In principle the 
homology range in the final population may be even greater than 
that for the startling population. 

15 The present invention relates in its first aspect to a 

method for the construction c: a library of recombined polynu- 
cleotides from a number of different starting single or double 
stranded parental DNA templates, wherein said starting single 
or double stranded parental DNA templates represent discrete 

20 points in a population of genes encoding evolutionary or syn- 
thetic homolcgues of a peptide having homologies ranging over a 
broad spectrum ::or less than 15% to more than 80%, said popu- 
lation exhibiting at least cne identification sequence, and 
whereby said genes are subjected to a gene shuffling procedure 

25 to generate shuffled mutants of said population of genes repre- 
senting additional discrete points between those of said start- 
ing templates. 

Accercm z to the invention it is possible to use parental 
UNA templates representing hcrrologies ranging from less than 
30 455, 40%, 35 :, 33%, 251, : : • , :: 15% to more than 80s, 85*, 
90s, 95%, or 99% . 



WO 98/41622 PCT/DK98/00103 

12 

In specific embodiments at least one identification se- 
quence is identified and primers constructed to anneal thereto. 
These sequences can be located anywhere on the genes. 

In a preferred embodiment at leasts two ident if icat icr. 
5 sequences are identified. These sequences can be located at any 
distance from each other, but it is preferred that they are lo- 
cated as far as possible from each other on the genes. 

According to these embodiments said identification se- 
quences may correspond to an amino acid sequence of from 4 to E 
10 amino acid residues, which sequence is highly conserved among 
the peptides encoded by the col-lection of starting single or 
double stranded parental DNA templates, preferably from 5 to 7 
amino acid .residues. 

It is preferred that the identification sequences are Jc- 
15 cated a distance apart corresponding to the average size of the 
genes in said collection with a variation of up to 40%. The 
lonoer apart the sequences are the larger a part of the gene is 
shuffled. 

However, situations may arise, where it is desired only 
20 to shuffle the seauences between identification sequences lo- 
cated cuite close to each other. 

~s indicated above the cene snuffling method used in the 
method of the invention is cf less or no significance. In prin- 
ciple any method will work. 
25 Thus the methods disoiosed m WO 95/22625 and WO 95/17413 

are fuilv ooerable m the present invention. Details showing 
now these methods may be used for practising the present inven- 
tion are indicated m the Examples below. 

Therefore further cene shuffling methods described in co- 
30 filed oacent aooiications are else contemplated for use in the 
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According to one of these procedures template shifts of 
newly synthesized DNA strands during in vitro DNA synthesis are 
utilized to achieve DNA shuffling. 

More specifically that method provides for the constru:- 
5 tion of a library of recombined homologous polynucleotides from 
a number of different starting single or double stranded paren- 
tal DNA templates and primers by induced template shifts during 
an in vitro polynucleotide synthesis using a polymerase, 
whereby 

1C A. extended primers or polynucleotides are synthesized by 

a) denaturing parental double stranded DNA templates to 
produce single stranded templates, 

b) annealing said primers to the single stranded DNA tem- 
plates; 

15 c) extending said primers by initiating synthesis by use 

of said polymerase, 

d) cause arrest of the synthesis, and 

e) denaturing the double strand to separate the extended 
orimers from the templates , 



C. 



i template shirt 
a ) isolating t : 
t ^ n d d u> r i rr e i 
A.b) to A . e i 

(A) as both p 



^ - ^ ' ~ d b v 
r.ewlv synthesized single stranded ex- 
frcm. the templates and repeating steps 
isir.c said extended primers produced ir. 
mers and templates, or 
b) repeating steps A.b) to A.e), 
the above process is terminated after an appropriate num- 
ber of cycles of process steps A. and B.a), A. and B . b ; , 
or combinations thereof, and 

'■e produce;: : ..-/nucleotides are amplified in a 



optional iv 



stanoarc ir^r\ reaction specific primers tc selectively 

amplify polynucleotides :f interest. 
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In a further specific embodiment the gene shuffling is 
performed by the method described in our co-filed application, 
whereby conserved regions of heterologous DNA sequences are 
5 identified for shuffling of heterologous DNA sequences of in- 
terest having at least one conserved region comprising the 
following steps: 



i) One or more conserved region (s) (designated "A,B,C 
10 etc..) in two or more of the heterologous sequences are iden- 
tified. 



ii) Two sets of PGR primers .(each set comprising a sense and 
an anti-sense primer) for one or more conserved region (s) 

15 identified m (i) are constructed. 

In these primers, one set (named: "a"=sense primer; 
"a'"=anti-sense primer) is directed to a sequence region 5' 
(sense strain) of the conserved region (e.g. conserved region 
"A"), and the second set (named: "b" = sense primer; "b' " = anti- 

20 sense primer) is directed :: a sequence region 3' (sense 
strain) of the conserved region (e.g. conserved region "A"), 
and the antisense primer "a'" and the sense primer "b" have a 
nemolccDus sequence overlap of at least 10 base pairs (bp) 
w 1 1 h i n the conserved r e g i c n . 

in) for one or more identified conserved region of interest 
in step (i) tv/o PGR amplification reactions are performed us- 
inc the heterologous DNA sequences from step (i) as templates, 
whereby one of the PGR reactions is using the 5' primer set 
30 identified in step (n) (e.g. named "a", "a"') and the second 
PGR reaction is using the 2' primer set identified m step 
(ii ) (e.g. named "b","b'"). 



iv). The PGR fragments generated as described in step (in) 
35 for one cr more of the identified conserved region in step 
; i ; ; are isolated. 
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v) Two or more PCR fragments isolated from step (iv) ar.d 
performance of a Sequence overlap extension PCR reaction (SCI- 
PCR) using the isolated PCR fragments as templates are pooled. 

5 

vi) The PCR fragment (s) obtained in step (v) are isolated, 
whereby the isolated PCR fragment comprise numerous different 
shuffled sequences containing a shuffled mixture cf the PCR 
fragments isolated in step (iv) . 

10 

In specific embodiments various modifications can be made 
in the process of the invention.- For example it is advantageous 
to apply a defective polymerase either an error-prone po- 
lymerase to introduce nutations in comparison to the templates, 
15 or a polymerase that will discontinue the polynucleotide syn- 
thesis prematurely to effect the arrest of the reaction. 

According to a specific embodiment the peptide is a pro- 
tease, especially a subtilase. 

In the case of a subcilase identification sequences m = y 
2C be located around the aspartic acid in position 32, or the his- 
tidine in position 64 ar.z the active serine in position 221 c: 
subtilisin 3?i:' . 

In a further emi:uimer.t the peptide is an amylase, espe- 
cially an a-amylase. 
25 In that case cf identification sequences may be locates" 

around the Asp m position id and the Asp in position 328 c: 
3 . li cheni forr.is a-amyiase. 

For a-amyiases from Bacillus species the identif ica t icr. 
sequences may preferentially re located around Tyr in position 
33 £ and around Ser in positicn ~ 1 . 

In further embodiment c t:.e peptide is a lipase, or a cel- 
lulase . 
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X 0 



In respect of lipases, suitable identification sequences 
may be found by using the lipase alignment shown in A. Svendsen 
et al. (1995): Biochemical properties of cloned lipases free, 
the Pseudomonas family, Biochimica et Biophysics Acta 1259 5- 
5 17. Examples could be around the Pro in position 10 or around 
the His in position 285 (using P. glUi-nae lipase numbering) . 

In respect of cellulases, in particular cellulases frcr. 
family 45 cellulases (see WO 96/29397), suitable identification 
sequences may be the conserved region ""Thr Arg Tyr Trp Asp Cys 
10 Cys Lys Pro/Thr" and the conserved region ""Trp Arg Phe/Tyr 
Asp Trp Phe". For further oetails relating to those cellulase 
identification sequences reference is made to ( PCT 
DK97/00216). See in particular in example 3 of (FIT 
DKS7/00216) . 

15 in respect of xyianases, in particular xylanases from 

family 11 xylanases, suitable identification sequences may be 
the conserved regions vv DGGT ; C 1 ';' " and "ZGYQSSG" . For further de- 
tails relating to those xylanase identification sequences ref- 
erence is made to (PCT D?;97 / 0C2 1 6 ) . See in particular in exai- 

20 pie 1,2 cf ( PCT DK97./002 1 c ; . 



PC: 



; r imers : 



Tne PGR primers are constructed according to the standard 
cescripticns m tne art. Normally they are 10-75 base-pairs 
(bo) lone. However, for tr.e soecific embodiment using random c: 
semi-rancor:, primers the length may be substantially longer as 
indicatec above. 



: Cr. -react i 



31 If not otherwise met 

cording to the invention 
crotocois knov;n in. tne art 



: tne PC?.- reaction pertormec ao- 
-erformed according to standard 
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The term "Isolation of PCR fragment" is intended to cover 
as broad as simply an aliquot containing the PCR fragment. How- 
ever preferably the PCR fragment is isolated to an extend which 
remove surplus of primers, nucleotides templates etc. . 
5 In an embodiment of the invention the DNA fragment (s) 

is (are) prepared under conditions resulting in a low, medium or 
high random mutagenesis frequency. 

To obtain low mutagenesis frequency the DNA sequence (s) 
(comprising the DNA fragment (s)) may be prepared by a standard 
10 PCR amplification method (US 4,683,202 or Saiki et al., (1985), 
Science 239, 487 - 4 91) . 

A medium or high mutagenesis frequency may be obtained by 
performing the PCR amplification under conditions which in- 
crease the mis incorporation of nucleotides, for instance as ce- 
15 scribed by Deshler, (1992), GAT A 9(4), 103-106; Leung et al., 
(1989), Technique, Vol. 1, Nr. 1, 11-15. 

It is also contemplated according to the invention to 
combine the PCR amplification (i.e. according to this embodi- 
ment also DNA fragment mutation) with a mutagenesis step using 
20 a suitable physical or chemical mutagen! zing agent, e.g., one 
which induces trans it icns , t rans vers ions , inversions, scrambl- 
ing, deletions, a n d / o r insertions. 

Expressing the recombinant protein from the recombinant shuf- 

2 5 fled sequences 

Expression of the recombinant protein encoded by the 
shuffled sequence in step vi) cf the second and third aspect of 
the present invention may de performed by use of standard ex- 
pression vectors and corresponding expression systems known in 

30 the art. 



WO 98/41622 PCT/DK98/00103 

IS 

Screening and selection 

In the context cf the present invention the term 
"positive polypeptide variants" means resulting polypeptide 
variants possessing functional properties which 'has been irc- 
5 proved in comparison to the polypeptides producible from the 
corresponding input DMA sequences. Examples, of such improved 
properties can be as different as e.g. enhance or lowered bio- 
logical activity, increased wash performance, thermostability, 
oxidation stability, substrate specificity, antibiotic resis- 
10 tance etc. 

Consequently, the screening method to be used for identi- 
fying positive variants depend on which property of the 
polypeptide in question it is desired to change, and in what 
direction the change is desired. 
15 A number of suitable screening cr selection systems to 

screen or select for a desired biological activity are de- 
scribed in the art. Examples are: 

Strauberg et al. (Biotechnology 13: 669-673 (1935), de- 
scribes a screening system for subtiiisin variants having a 
2 0 Calcium- independent stability; 

Bryan et ai. (Prcteins 1:326-334 (1986)) describes a 
screening assay for pretense having enhanced thermal stability; 
a r. c 

PC7-DK95/0C322 describes a screening assay for lipases 
2 5 having an improved wash performance in washing detergents. 

An embodiment of the invention comprise screening or se- 
lection of recombinant protein (s) , wherein the desirec biologi- 
cal activity is performance in dish-wash or laundry detergents. 
Examples of suitable dish- v. ■- : ;. cr laundry detergents are dis- 
30 closed in PC7-DK96/00322 ace r5/30Cll. 
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If the improved functional property of the polypeptide is 
not sufficiently good after one cycle of shuffling, the 
polypeptide may be subjected to another cycle. 

In an embodiment of the invention wherein polynucleotides 
5 representing a number of nutations of the same gene is used as 
templates at least one shuffling cycle is a backcrossing cycle 
with the initially used DNA fragment, which may be the wild- 
type DNA fragment. This eliminates non-essential mutations. 
Non-essential mutations may also be eliminated by using wilc- 

10 type DNA fragments as the initially used input DNA material. 

Also contemplated to be within the invention is polypep- 
tides having biological activity such as insulin, ACTH, gluca- 
gon, somatostatin, somatotropin, thymosin, parathyroid hormone, 
pituary hormones, somatomedin, erythropoietin, luteinizing hcr- 

15 ■ none, chorionic gonadotropin, hypothalamic releasing factors, 
antidiuretic hormones, thyroid stimulating hormone, relaxin, 
interferon, thrombopoeitin (TPD) and prolactin. 

It is also contemplated according to the invention to 
shuffle parental polynucleotides as indicated above originating 

20 from: wild type organisms c : different genera. 

The starting parental DNA sequences may be any DNA se- 
quences including wild-type DNA sequences, DNA sequences er.coc- 
mc variants cr .mutants, or ".edifications thereef, such as ex- 
tended or elongated DNA sequences, and may also be the outcome 

25 cf DNA sequences having been subjected to one or more cycles of 
shuffling {i.e. output DNA sequences) according to the method 
of the invention or any ether method (e.g. any of the metnods 
described in the prior art section) , or synthetic sequences cr 
otherwise mutagenized sequences . 

30 When using one method cf the invention the resulting re- 

combined polynucleotides [i.e. shuffled DNA sequences), have 
had a number cf nucleotide fragments exchanged. This results in 
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replacement of at least one air.ino acid within the polypeptide 
variant, if comparing it with the parent polypeptide. It is to 
be understood that also silent exchanges are contemplated (i.e. 
nucleotide exchange which does not result in changes in the 
5 amino acid sequence) . 



MATERIALS AND METHODS 
10 EXAMPLES 
EXAMPLE 1 

Shuffling ■ of a pool /popular ion cf evolutionary homoiocues 
originating from bacterial hosts. 

1 5 

In this Example a gene shuffling method similar to the 
one described in WO S5/22625 is used: 

A population of subtilase-er.coding genes or parts of such 
cer.es are generated through isolation or by synthesis. Sources 

2 0 for the genes may be as ces:: icec i r. Siezer. ec al. Protein En- 

gineering 4 1991 7 1 9-7 37. The population may also comprise 
cer.es encoding the pre-prc subtilases as defined in Gen3ank en- 
tries A13C50_1, D26542, A2255C, Swiss-?rot entry SU5T_BACAM 
?::7E2, and PD493 (Patent Application No. WO 95/34963) with 
21 serologies ( similarities ) ranging from 32% to 64% as calculates 
bv the MegAiign software from ON AS TAR Inc. (WI 53715, USA) us- 
inc the Ciustai Method. 



The substrates usee in tre shuffling reaction are rep re - 
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histidine in pos 64 of subtiiisin BPN' and the serine in posi- 
tion 221 of subtilisin B?N' . The template for this PGR can ei- 
ther be plasmids containing cloned protease genes or chromoso- 
mal DNA extracted from bacterial strains e.g. protease secre:- 
5 ing bacteria isolated from soil. The substrate will typically 
be generated separately for all the templates and pooled before 
the shuffling reaction. 

The substrates are fragmented e.g. by DNAse I treatment 
or shearing by sonication as described in WO 95/22625. The gen- 

10 erated fragments are separated according to size by agarose eel 
electrophoresis and generated -fragment of the desired size , 
e.g. from 10 to 50 bp. or from 30 to 100 bp, or from 50 to 150 
bp, or from 100 to 200 bp are purified from the gel. 

These fragments are reassembled by PGR as described in 

15 W095/2 2 62 5. Optionally, correctly assembled DNA fragments are 
amplified by subjecting the product from the assembly reaction 
to another PCR including twe primers able to anneal to the ends 
of correctly assembled fragments. The resulting fragments can 
be cloned into suitable expression plasmids, and subsequently 

20 screened for a specific property, such as thermostability using 
assavs well known in the art. 
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PATENT CLAIMS 



1 . A method for the construction of a library of recombined 
polynucleotides from a number of different starting single cr 

5 double stranded parental DNA templates, wherein said starting 
single or double stranded parental DNA templates represent dis- 
crete points in a population of genes encoding evolutionary cr 
synthetic homologues of a peptide having homologies ranging 
over a broad spectrum from less than 15% to more than 80%, said 
10 population exhibiting at least one identification sequence, an: 
whereby said genes are subjected to a gene shuffling procedure 
to generate shuffled mutants of said population of genes repre- 
senting additional discrete points between those of said starr- 
ing templates. 

15 

2. The method of claim. 1, wherein said homologies range frcm 
less than 45%, 40%, 35%, 30*, 25%, 20%, or 15% to more than 
80^, 85%, 90%, 95%, or 99%. 



2. The method of claim. 1 or 2, wherein said starting popula- 
tion exhibits at least two identification sequences. 



4. The method of any of the claims 1 to 3, wherein said 
i cent i f i cat ion sequences corresponds to amino acid sequences c: 
25 from. 4 to 8 amino acid residues, which sequence is highly con- 
served among the peptides encoded by said collection of start- 
ing single or double stranded parental DNA templates, prefera- 
blv frcm 5 to 7 amino acid reridues. 



3 0 5. The method or claim, j ::: 4, wnerem saic icent i f ica t ic 
secuences are located a cist;: nee apart corresponding to the a" 
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erage size of the genes in said collection with a variation c: 
up to 40%. 

6. The method of claim 3, wherein said variation is 20*, 
5 15%, 10%, or 5%. 

7. A method of identifying a polypeptide of interest exhib- 
iting improved properties in comparison to naturally occurring 
or other known polypeptides of the same activity, whereby a 

10 population of recombined polynucleotides produced by a process 
according to any of the claims 1- to 6 are cloned into an apprc- 
Driate vector, said vector is . trans formed into a suitable host 
system, to be expressed into the corresponding polypeptides, 
and said polypeptides are screened in a suitable assay, and 

15 positive polypeptides selectee. 

8. A method for producing a polypeptide of interest as iden- 
tified according to claim 7, whereby a vector comprising a 
polynucleotide encoding said identified polypeptide is trans- 

20 formed into a suitable nast, said host is crown to express saic 
0 ^ V 3^ot id° , and the cclvn-eotice recovered and purified. 

- The method of claim E, wherein said peptide is a prote- 

ase, especially a subtilase. 

O £ 
Z. U 

10. The method of claim. 9, wherein said identification se- 
cuences are located around the histidine in position 64 and the 
active serine in position Hi cf subtiiism 5?N' . 

30 11. The method cf claim -.- . wherein said peptide is an amy- 
lase, especially an a-amylase. 
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12. The method of claim II, wherein said identification se- 
auences are located around the Asp in position 100 and the Asp 
in position 323 of 5 . licheniformis a-amylase. 

5 

12. The method of claim 11, wherein said identification se- 
quences are located around the Tyr in position 8 and around Ser 
in position 476 of B . licheniformis a-amylase. 

10 13. The method of claim 8, wherein said peptide is a lipase. 

14. The method ^of claim 13/ wherein said identification se- 
quences are located around the Pro m position 10 and around 
the His in position 285 of F clumae lipase. 

1 5 

15. The method of claim E, w r. ereir, said peptide is a c e 1 1 ^ - 
lase . 

15. The method of claim E, wherein said peptide is a xyla- 
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